# Multilingual Speech Recognition
Whisper Small
Apache-2.0
Whisper is a pre-trained automatic speech recognition (ASR) and speech translation model, trained on 680,000 hours of annotated data with strong generalization capabilities.
Speech Recognition Supports Multiple Languages
W
unsloth
50
1
Whisper Large V3 Turbo
MIT
Whisper is OpenAI's state-of-the-art automatic speech recognition (ASR) and speech translation model, trained on over 5 million hours of labeled data with strong zero-shot generalization capabilities. The Turbo version is a pruned and fine-tuned variant of the original, reducing decoder layers from 32 to 4, significantly improving speed with a slight quality trade-off.
Speech Recognition
Transformers Supports Multiple Languages

W
unsloth
94
1
Whisper Large V3
Apache-2.0
Whisper is OpenAI's state-of-the-art automatic speech recognition (ASR) and speech translation model, supporting multiple languages
Speech Recognition
Safetensors Supports Multiple Languages
W
unsloth
4,002
1
Quantum STT
Apache-2.0
Quantum_STT is an advanced automatic speech recognition (ASR) and speech translation model, trained with large-scale weak supervision, supporting multiple languages and tasks.
Speech Recognition
Transformers Supports Multiple Languages

Q
sbapan41
100
1
Whisper Large V3 Turbo Gguf
MIT
Whisper large-v3-turbo is a pruned and fine-tuned version based on Whisper large-v3, with the decoder layers reduced from 32 to 4, significantly improving speed while slightly reducing quality.
Speech Recognition Supports Multiple Languages
W
xkeyC
546
1
Canary 180m Flash
NVIDIA NeMo Canary Flash is a multilingual multitask speech model supporting automatic speech recognition and translation tasks in English, German, French, and Spanish.
Speech Recognition Supports Multiple Languages
C
nvidia
15.17k
60
Whisper Large V3.w4a16
Apache-2.0
This is the quantized version of openai/whisper-large-v3, employing INT4 weight quantization and FP16 activation quantization, suitable for vLLM inference.
Speech Recognition
Transformers English

W
nm-testing
20
1
Owls 4B 180K
OWLS is a suite of Whisper-style models designed to help researchers understand the scaling properties of speech models, supporting multilingual speech recognition and translation.
Speech Recognition Other
O
espnet
40
5
Whisper Large V3 Distil Multi4 V0.2
MIT
This is a multilingual distilled version of the Whisper model with 2 decoder layers, supporting 4 European languages: English, French, Spanish, and German.
Speech Recognition
Transformers Supports Multiple Languages

W
bofenghuang
70
1
Voice Clone Large Finetune Final
Apache-2.0
This model is a voice cloning model fine-tuned based on openai/whisper-large-v3, primarily used for speech recognition tasks, achieving a word error rate of 15.3572 on the evaluation set.
Speech Recognition
Transformers

V
neuronbit
37
2
Faster Whisper Large V3 Turbo Ct2
MIT
This is a version of the Whisper large-v3 turbo model converted to the CTranslate2 format for efficient automatic speech recognition tasks.
Speech Recognition Supports Multiple Languages
F
deepdml
254.96k
128
Whisper Large V3 Turbo
MIT
Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.
Speech Recognition
Transformers Supports Multiple Languages

W
openai
4.0M
2,317
Whisper Large V3 Gguf
Apache-2.0
Whisper is a multilingual automatic speech recognition (ASR) system that supports speech-to-text tasks in multiple languages.
Speech Recognition Supports Multiple Languages
W
vonjack
931
14
Faster Whisper Large V3 Ja
MIT
Japanese-optimized version based on OpenAI Whisper large-v3, supporting multilingual speech recognition
Speech Recognition Supports Multiple Languages
F
JhonVanced
46
3
Mms 1b Fl102
MMS-1B-FL102 is part of Facebook's Massively Multilingual Speech project, an automatic speech recognition model supporting 102 languages, based on the 1-billion-parameter Wav2Vec2 architecture, achieving multilingual transcription through adapter technology.
Speech Recognition
Transformers Supports Multiple Languages

M
facebook
6,360
26
Mms 1b All
Part of Facebook's Massively Multilingual Speech project, supporting automatic speech recognition for 1162 languages
Speech Recognition
Transformers Supports Multiple Languages

M
facebook
108.10k
140
Faster Whisper Small
MIT
Transformer-based automatic speech recognition (ASR) model supporting multilingual transcription
Speech Recognition Supports Multiple Languages
F
guillaumekln
4,599
15
Whisper Base
Apache-2.0
Whisper is a pre-trained automatic speech recognition (ASR) and speech translation model, trained on 680k hours of labeled data with strong generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
491.35k
216
Whisper Tiny
Apache-2.0
Whisper Tiny is an automatic speech recognition (ASR) model developed by OpenAI, the smallest version in the Whisper series with 39M parameters.
Speech Recognition Supports Multiple Languages
W
openai
328.82k
318
M Ctc T Large
Apache-2.0
A large-scale multilingual speech recognition model introduced by Meta AI, supporting 60 languages, based on a 1-billion-parameter Transformer encoder architecture.
Speech Recognition
Transformers English

M
speechbrain
88
20
Mctct Large
Apache-2.0
A large-scale multilingual speech recognition model introduced by Meta AI, featuring 1 billion parameters and supporting character-level transcription for 60 languages
Speech Recognition
Transformers English

M
cwkeam
21
0
Xtreme S Xlsr Minds14
Apache-2.0
This model is a speech processing model fine-tuned from facebook/wav2vec2-xls-r-300m, achieving high F1 scores and accuracy on the evaluation dataset.
Speech Recognition
Transformers

X
anton-l
25
1
Wav2vec2large Xlsr Akan
This is a universal voice model supporting speech recognition and audio processing tasks.
Speech Recognition Other
W
azunre
2,834
0
Xlrs 53 Finnish
Apache-2.0
XLSR-Wav2Vec2 is a multilingual speech recognition model that learns shared speech representations through cross-lingual pretraining, supporting 53 languages.
Speech Recognition Other
X
vneralla
32
0
Wav2vec2 Xlsr Multilingual 56
Apache-2.0
This is a multilingual automatic speech recognition (ASR) model supporting 56 languages, fine-tuned from facebook/wav2vec2-large-xlsr-53 on the Common Voice dataset.
Speech Recognition
Transformers Supports Multiple Languages

W
voidful
21.69k
30
Lang Id Commonlanguage Ecapa
Apache-2.0
A speech language recognition model using the ECAPA-TDNN architecture, supporting recognition of 45 languages
Audio Classification Supports Multiple Languages
L
speechbrain
190
36
Lang Id Voxlingua107 Ecapa
Apache-2.0
A speech language identification model based on the SpeechBrain framework and ECAPA-TDNN architecture, supporting recognition and speech embedding extraction for 107 languages.
Audio Classification Supports Multiple Languages
L
speechbrain
330.01k
112
Wav2vec2 Large Xlsr Hindi Marathi
Apache-2.0
Fine-tuned based on Facebook's wav2vec2-large-xlsr-53 model, supporting automatic speech recognition tasks for Hindi and Marathi
Speech Recognition
Transformers Other

W
tanmaylaud
76
0
Featured Recommended AI Models